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Abstract. We describe a system for the automated certification of safety proper- 
ties of NASA software. The system uses Hoare-style program verification tech- 
nology to generate proof obligations which are then processed by an automated 
first-order theorem prover (ATP). For full automation, however, the obligations 
must be aggressively preprocessed and simplified. We describe the unique re- 
quirements this places on the ATP and demonstrate how the individual simpli- 
fication stages, which are implemented by rewriting, influence the ability of the 
ATP to solve the proof tasks. Experiments on more than 25,000 tasks were carried 
out using Vampire, Spass, and e-setheo. 


1 Introduction 

Software certification aims to show that the software in question satisfies a certain level 
of quality, safety, or security. Its result is a certificate , i.e., independently checkable 
evidence of the properties claimed. Certification approaches vary widely, ranging from 
code reviews to full formal verification, but the highest degree of confidence is achieved 
with approaches that are based on formal methods and use logic and theorem proving 
to construct the certificates. 

We have developed a certification approach which uses Hoare-style techniques to 
demonstrate the safety of aerospace software which has been automatically generated 
from high-level specifications. Our core idea is to extend the code generator so that it 
simultaneously generates code and detailed annotations, e.g., loop invariants. A verifi- 
cation condition generator (VCG) processes the annotated code and produces a set of 
safety obligations , which are provable if and only if the program is safe. An automated 
theorem prover (ATP) then discharges these obligations and the proofs, which can be 
verified by an independent proof checker, serve as certificates. 

In this paper, we describe and evaluate the application of ATPs to discharge the 
emerging safety obligations. This is a crucial aspect of our approach since its practica- 
bility hinges on a very high degree of automation. Our first hypothesis is that the current 
generation of high-performance ATPs is — in principle — already powerful enough for 
practical applications. However, this is still a very demanding area because the num- 
ber of obligations is potentially very large and program verification is generally a hard 
problem domain for ATPs. Our second hypothesis is thus that the application still needs 
to carefully preprocess the proof tasks to make them more tractable for ATPs. 

In our case, there are several factors which make a successful ATP application possi- 
ble. First, we certify separate aspects of safety and not full functional correctness. This 
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Fig. 2. Safety Formulas for Different Policies 


c , and a substitution sub(c), which captures how the command changes the environ- 
mental information relevant to the safety policy. Using cond and sub, the rules of the 
safety policy can then be derived systematically from the standard Hoare rules of the 
underlying programming language [2]. 

From our perspective, the safety conditions cond(c) are the most interesting aspect 
since they have the greatest bearing on the form of the proof obligations. Figure 2 sum- 
marizes the different formulas and the domain theories needed to reason about them. 
Both variable initialization and usage as well as array bounds certification are logically 
simple and rely just on propositional and simple arithmetic reasoning, respectively, but 
can require a lot of information to be propagated throughout the program. The symme- 
try policy requires reasoning about matrix expressions expressed as a first-order quan- 
tification over all matrix entries. The vector norm policy is formalized in terms of the 
summation over entries in a one-dimensional array, and involves symbolic reasoning 
over summation expressions. 


2.2 Generating Proof Obligations 

For certification purposes, the synthesis system annotates the code with mark-up infor- 
mation relevant to the selected safety policy. These annotations are part of the schema 
and thus instantiated in parallel with the code fragments. The annotations contain local 
information in the form of logical pre- and post-conditions and loop invariants, which 
is propagated throughout the code. The fully annotated code is then processed by the 
VCG, which applies the rules of the safety policy to the annotated code in order to gen- 
erate the safety conditions. As usual, the VCG works backwards through the code. At 
each line, safety conditions are generated and the safety substitutions are applied. The 
VCG has been designed to be “correct-by-inspection”, i.e., to be sufficiently simple so 
that it is straightforward to see that it correctly implements the rules of the logic. Hence, 
the VCG does not carry out any simplifications; in particular, it does not actually apply 
the substitutions (i.e., execute the specified replacements) but maintains explicit formal 
substitution terms. Consequently, the generated verification conditions (VCs) tend to 
be large and must be simplified separately; the more manageable simplified verification 
conditions (SVCs) which result are then processed by a first order theorem prover. The 
resulting proofs can be sent to a proof checker (e.g.. Ivy [13]). However, since most 
ATPs do not produce explicit proofs in a standardized format, we will not focus on 
proof checking here but concentrate on the simplification and theorem proving steps. 
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Fig. 3. Structure of a safety obligation 


The structure of a typical safety obligation (after substitution reduction and simpli- 
fication) is given in Figure 3. It corresponds to the initialization safety of an assignment 
within a nested loop. Most of the hypotheses consist of annotations which have been 
propagated through the code and are irrelevant to the line at hand. The proof obligation 
also contains the local loop invariants together with bounds on f or-loops. Finally, the 
conclusion is generated from the safety formula of the corresponding safety policy. 

2.3 Processing Proof Obligations and Connecting the Prover 

The simplified safety obligations are then exported as a number of individual proof 
obligations using TPTP first order logic syntax. A small script then adds the axioms of 
the domain theory, before the completed proof task is processed by the theorem prover. 
Parts of the domain theory are generated dynamically in order to facilitate reasoning 
with (small) integers. The domain theory is described in more detail in Section 3.3. 

The connection to a theorem prover is straightforward. For provers that do not ac- 
cept the TPTP syntax, the appropriate TFIP2X-converter was used before invoking the 
theorem prover. Run-time measurement and prover control (e.g., aborting provers) were 
performed with the same TPTP tools as in the CASC competition [19]. 

3 Experimental Setup 

3.1 Program Corpus 

As basis for the certification experiments we generated annotated programs from four 
different specifications which were written prior to and independently of the exper- 
iments. The size of the generated programs ranges from 431 to 1157 lines of com- 
mented C-code, including the annotations. Table 1 gives a more detailed breakdown. 
The first two examples are AutoFilter specifications, dsl is taken from the atti- 
tude control system of NASA’s Deep Space One mission [22]. iss specifies a com- 
ponent in a simulation environment for the Space Shuttle docking procedure at the 
International Space Station. In both cases, the generated code is based on Kalman-filter 
algorithms, which make extensive use of matrix operations. The other two examples 
are AutoBayes specifications which are part of a more comprehensive analysis of 



planetar} 7 nebula images taken by the Hubble Space Telescope (see [7,4] for more 
details), segm describes an image segmentation problem for which an iterative (nu- 
merical) statistical clustering algorithm is synthesized. Finally, gauss fits an image 
against a two-dimensional Gaussian curve. This requires a multivariate optimization 
which is implemented by the Nelder-Mead simplex method. The code generated for 
these two examples has a substantially different structure from the state estimation 
examples. First, the numerical optimization code contains many deeply nested loops. 
Also, some of the loops are convergence loops which have no fixed upper bounds 
but are executed until a dynamically calculated error value gets small enough. In con- 
trast, in the Kalman filter code, all loops are executed a fixed (i.e., known at synthesis 
time) number of times. Second, the numerical optimization code accesses all arrays 
element by element and contains no operations on entire matrices (e.g., matrix multi- 
plication). The example specifications and all generated proof obligations can be found 
at http : / /ase . arc . nasa . gov/autobayes/ i j car. 


3.2 Simplification 

Proof task simplification is an important and integral part of our overall architecture. 
However, as observed before [9,5, 17], simplifications — even on the purely proposi- 
tional level — can have a significant impact on the performance of a theorem proven In 
order to evaluate this impact, we used six different rewrite-based simplifiers to generate 
multiple versions of the safety obligations. We concentrate on rewrite-based simpli- 
fications rather than decision procedures because rewriting is easier to certify: each 
individual rewrite step can easily be double-checked independently. 

Baseline The baseline is given by the rewrite system % which eliminates the extra- 
logical constructs (including explicit formal substitutions) which the VCG employs dur- 
ing the construction of the safety obligations. Our original intention was to axiomatize 
these constructs in first-order logic and then (ab-) use the provers for this elimination 
step, but that turned out to be unfeasible. The main problem is that the combination 
with equality reasoning produces tremendous search spaces. 

Propositional Structure The first two “proper ’ simplification levels only work on 
the propositional structure of the obligations. 7v,=> splits the few but large obligations 
generated by the VCG into a large number of smaller obligations. It consists of two 
rewrite rules 'ix • P A Q ^ (Vx • P) A (Va; • Q) and P (Q A R) ^ (P => 
Q) A (P => R) which distribute universal quantification and implication, respectively 
over conjunction. Each of the resulting conjuncts is then treated as an independentproof 
task T piop simplifies the propositional structure of the obligations more aggressively. It 
uses the rewrite rules 
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in addition to the two rules in The rules have been chosen so that they preserve 
the overall structure of the obligations as far as possible; in particular, conjunction and 
disjunction are not distributed over each other and implications are not eliminated. Their 
impact on the clausifier should thus be minimal. 

Ground Arithmetics This simplification level additionally handles common exten- 
sions of plain first-order logic, i.e., equality, orders, and arithmetics. The rewrite sys- 
tem contains rules for the reflexivity of equality and partial orders as well as the 
ineflexivity of strict orders, although the latter rules are not invoked on the example 
obligations. In addition, it normalizes orders into < and > using the (obvious) rules 

x > y ^ y < x —'X>y^x<y 
x <y^ y> x ^x<y^x>y 

The choice of the specific orders is arbitrary; choosing for example < instead of > 
makes no difference. However, a further normalization by elimination of either the par- 
tial or the strict order (e.g., using a rule x<y^x<yVx = y) leads to a substantial 
increase in the formula size and thus proves to be counter-productive. 

also contains rules to evaluate ground integer operations (i.e., addition, subtrac- 
tion, and multiplication), equalities, and partial and strict orders. Moreover, it converts 
addition and subtraction with one small integer argument (viz. less than five) into Press- 
burger notation, using rules of the form x+W succ(x) and x — 1 ^ pred(x). For 
many safety policies (e.g., init), such terms are introduced by relativized bounded quan- 
tifiers (e.g., Vx-Q < x < n— 1 => P(x)) and contain the only occurrences of arithmetic 
operators. A final group of rules handles the interaction between succ and pred, as well 
as with the orders. 

succ{pred(x)) ^ x pred(succ(x)) ^ x 

succ(x) < y ^ x < y succ(x) > y x > y 

x < pred(y) ^ x < y x > pred(y) x > y 

Language-Specific Simplification The next level handles constructs which are spe- 
cific to the program verification domain, in particular array-expressions and conditional 
expressions, encoding the necessary parts of the language semantics. The rewrite system 
7L y adds rewrite formulations of McCarthy's array axioms [12], i.e., sel(upd(a, z, v), j) ^ 
i = j ? v : sel(a , j) for one-dimensional arrays and similar forms for higher-dimensional 
arrays. Also, some safety policies are formulated using anays of a given dimension- 
ality which are uniformly initialized with a specific value. These are represented by a 
constarray- term, for which similar rules are required, e.g., sel(constarray(v , d), i) v. 

Nested sel/upd- terms, which result from sequences of individual assignments to the 
same array, lead to nested conditionals which in turn lead to an exponential blow-up 
during the subsequent language normalization step. 7^ thus also contains two rules 
true ? x : y ^ x and false ? x : y y to evaluate conditionals. 

In order to evaluate the effect of these domain-specific simplifications properly, 
we also experimented with a rewrite system 7^, which applies the two sel-mlcs in 
isolation. 

Policy-Specific Simplification The most aggressive simplification level 7^ iicy uses a 
number of rules which are fine-tuned to handle situations that frequently arise with 



specific safety policies. The mzf-policy requires a rule 

Vz • 0 < x < n => (x t 1 0 A . . . A x ^ n => P) true 

which is derived from the finite induction axiom to handle the result of simplifying 
nested sellupd- terms. For inuse , we need a single rule def =use ^ false, which follows 
from the fact that the two tokens def and use used by the policy are distinct. For symm , 
we make use of a lemma about the symmetry of specific matrix expressions: A J r BCB T 
is already symmetric if (but not only if) the two matrices A and C are symmetric, 
regardless of the symmetry of B. The rewrite rule 

sel(A + BCB t , L j) = **/( 4 + j, i) 

^ ^/(A^y) = sel(A.j.i) A sel{CA,j) ~ sel(C,j, i) 

formulates this lemma in an element-wise fashion. 

For the norm-policy , the rules become a lot more specialized and complicated. Two 
rules are added to handle the inductive nature of finite summations: 

£d (0) *-° 

P a * = £d (n) Q(i) =>* + Q(n) = £I£ 0 Q(0 

- r A * = ££f n) Q(i) => Elo <?(0 = £.=0 Q(0 

The first rule directly implements the base case of the induction; the second rule, which 
implements the step case, is more complicated. It requires alpha-conversion for the 
summations as well as higher-order matching for the body expressions. However, both 
are under explicit control of this specific rewrite rule and not the general rewrite engine, 
and are implemented directly as Prolog-predicates. A similar rule is required in a very 
specific situation to substitute an equality into a summation: 

P A (Vi • 0 < i < n => x = sel(f, i)) => J2i=o se Kf> i) = 1 
P A (Vi • 0 < i < n =$> x = ,se/(/, i)) => x = 1 

The above rules capture the central steps of some of the proofs for the norm-policy and 
mirror the fact that these are essentially higher-order inferences. 

Another set of rewrite rules handles all occurrences of the random number generator 
by asserting that the number is within its given range, i.e., I < rand(l , u) < u. 

Normalization The final preprocessing step transforms the obligations into pure first- 
order logic. It eliminates conditional expressions which occur as top-level arguments 
of predicate symbols, using rules of the form P ?T : F = R ^ (P T = R) A 
(-»P => F = R) and similarly for partial and strict orders. A number of congruence 
rules move nested occurrences of conditional expressions into the required positions. 
Finite summations, which only occur in obligations for the norm-policy, are represented 
with a de Bruijn- style variable-free notation. 

Control The simplifications are performed by a small but reasonably efficient rewrite 
engine implemented in Prolog. This engine does not support full AC-rewriting but flat- 
tens and orders the arguments of AC-operators. The rewrite rules, which are imple- 
mented as Prolog-clauses, then do their own list matching but can take the list ordering 



into account. The rules within each system axe applied exhaustively. However, the two 
most aggressive simplification levels and 7^, icy are followed by a “clean-up” phase. 
This consists of the language normalization followed by the propositional simplifica- 
tions T prop and the finite induction rule. Similarly, T ijnr is followed by the language 
normalization and then by to split the obligations. 

3.3 Domain Theory 

Each safety obligation is supplied with a first-order domain theory. In our case, the 
domain theory consists of a fixed part which contains 44 axioms, and a set of axioms 
which is generated dynamically for each proof task. The static set of axioms defines 
the usual properties of equality and the order relations, as well as axioms for simple 
Pressburger arithmetics and for the domain-specific operators (e.g., sel/upd or rand). 
The dynamic axioms are added because most theorem provers cannot calculate with 
integers, and to avoid the generation of large terms of the form succ(. . . (i'wcc(O) . . .). 
For all integer literals n, m in the proof task, we generate the corresponding axioms of 
the form m > n. For small integers (in our examples n < 6), we even generate axioms 
for explicit successor-terms, i.e., n = succ n (0) and add a finite induction schema of the 
form Vx : 0<r<n=^(r = OVr = lV...Vx-n).In our application domain, 
these axioms are needed for some of the matrix operations; thus n can be limited to the 
maximal (statically known) size of the matrices. 

3.4 Theorem Provers 

For the experiments, we selected several high-performance theorem provers for untyped 
first-order formulas with equality. Most of the provers participated at the CASC-19 

[18] proving competition in the FOL-category. We used two versions of e-setheo which 
have both been derived from the CASC- version. For e-setheo-csp03F, the clarification 
module has been changed and instead of the clausifier provided by the TPTP toolset 

[19] , FLOTTER V2. 1 [21, 20] was used to convert the formulas into a set of clauses, e- 
setheo-new is a recent development version with several improvements over the original 
e-setheo-csp03 version. Both versions of Vampire [16] have been taken directly “out of 
the box” — they are the versions which were running during CASC-19. Spass 2.1 was 
obtained from the developer’s website [20]. 

In the experiments, we used the default parameter settings and none of the special 
features of the provers. For each proof obligation, we limited the run-time to 60 sec- 
onds; the CPU-time actually used was measured with the TPTP-tools on a 2.4GHz dual 
processor standard PC with 4GB memory. 

4 Empirical Results 

4.1 Generating and Simplifying Obligations 

Table 1 summarizes the results of generating the different versions of the safety obliga- 
tions. For each of the example specifications, it lists the size of the generated programs 



(without annotations), the applicable safety policies, the respective size of the generated 
annotations (before propagation), and then, for each simplifier, the elapsed time and the 
number of generated obligations. 
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Table 1 . Results of generating safety obligations 


The elapsed times include synthesis of the programs as well as generation, simpli- 
fication, and file output of the safety obligations; synthesis alone accounts for approx- 
imately 90% of the times listed under the array safety policy. In general, the times for 
generating and simplifying the obligations are moderate compared to both generating 
the programs and discharging the obligations. All times are CPU- times and have been 
measured in seconds using the Unix time-command. 

Almost all of the generated obligations are valid, i.e., the generated programs are 
safe. The only exception is the inuse-policy which produces one invalid obligation for 
each of the dsl and iss examples. This is a consequence of the respective specifica- 
tions which do not use all elements of the initial state vectors. The invalidity is confined 
to a single conjunct in one of the original obligations, and since none of the rewrite 
systems contains a distributive law, the number of invalid obligations does not change 
with simplification. 

The first four simplification levels show the expected results. The baseline yields 
relatively few but large obligations which are then split up by 7y j=i> into a much larger 
(on average more than an order of magnitude) number of smaller obligations. The next 
two levels then eliminate a large fraction of the obligations. Here, the propositional 
simplifier 7^ rop alone already discharges between 50% and 90% of the obligations while 
the additional effect of evaluating ground arithmetics (T cvai ,) is much smaller and gener- 
ally well below 25%. The only significant difference occurs for the array-policy where 
more than 80% (and in the case of ds 1 even all) of the remaining obligations are re- 
duced to true. This is a consequence of the large number of obligations which have the 
form -m < n P for an integer constant n representing the (lower or upper) bound 




of an array. The effect of the domain-specific simplifications is at first glance less clear. 
Using the array-rules only, T my *, generally leads to an increase over 7y <= > in the number 
of obligations; this even surpasses an order of magnitude for the yvmm-policy. However, 
in combination with the other simplifications (7^ av ), most of these obligations can be 
discharged again, and we generally end up with less obligations than before; again, the 
symm-policy is the only exception. The effect of the final policy-specific simplifications 
is, as should be expected, highly dependent on the policy. For in use and norm a fur- 
ther reduction is achieved, while the rules for init and symm only reduce the size of the 
obligations. 

4.2 Running the Theorem Provers 

Table 2 summarizes the results obtained from running the theorem provers on all proof 
obligations (except for the invalid obligations from the inuse-policy), grouped by the 
different simplification levels. Each line in the table corresponds to the proof tasks orig- 
inating from a specific safety policy (array, init, inuse, symm, and norm). Then, for each 
proven the percentage of solved proof obligations and the total CPU-time are given. The 
last two columns give the maximum and minimum percentage of solved tasks. 

For the fully simplified version (7^), all provers are able to find proofs for all tasks 
originating from at least one safety policy; e-setheo-csp03F can even discharge all the 
emerging safety obligations This result is central for our application since it shows that 
current ATPs can in fact be applied to certify the safety of synthesized code, confirming 
our first hypothesis. 

For the unsimplified safety obligations, however, the picture is quite different. Here, 
the provers can only solve a relatively small fraction of the tasks and leave an unaccept- 
ably large number of obligations to the user. The only exception is the array-policy, 
which produces by far the simplest safety obligations. This confirms our second hy- 
pothesis: aggressive preprocessing is absolutely necessary to yield reasonable results. 

Let us now look more closely at the different simplification stages. Breaking the 
large original formulas into a large number of smaller but independent proof tasks 
(7v,=>) boosts the relative performance considerably. However, due to the large absolute 
number of tasks, the absolute number of failed tasks also increases. With each additional 
simplification step, the percentage of solved proof obligations increases further. Inter- 
estingly, however, 7v,=> and seem to have the biggest impact on performance. The 
reason seems to be that equality reasoning on deeply nested terms and formula struc- 
tures can then be avoided, albeit at the cost of the substantial increase in the number of 
proof tasks. The results with the simplification strategy , which only contains the 
language-specific rules, also illustrates this behavior. The norm-policy clearly produces 
the most difficult proof obligations, requiring essentially inductive and higher-order rea- 
soning. Here, all simplification steps are required to make the obligations go through 
the first-order ATPs. 

The results in Table 2 also indicate there is no single best theorem proven Even vari- 
ants of the “same” prover can differ widely in their results. For some proof obligations, 
the choice of the clarification module makes a big difference. The TPTP-converter 
implements a straightforward algorithm similar to the one described in [11]. Flotter 
has a highly elaborate conversion algorithm which performs many simplifications and 
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5925.0 

99.8 

217.0 

99.9 

240.5 

99.8 

152.4 

99.9 

94.2 


init 

3177 

88.4 

3969.4 

91.7 

20784.8 

97.4 

8732.2 

95.0 

14482.2 

93.5 

14203.4 

97.4 

88.4 


in-use 

1123 

59.3 

819.1 

96.4 

4100.3 

99.1 

1733.5 

95.3 

4183.7 

94.3 

4206.8 

99.1 

59.3 


symm 

286 

93.4 

1785.9 

90.6 

2341.0 

88.5 

3638.7 

90.2 

3315.8 

91.3 

1789.2 

93.4 

88.5 


norm 

155 

85.8 

1422.1 

73.5 

2552.5 

84.5 

1572.0 

87.7 

1359.9 

87.1 

1276.0 

87.7 

73.5 

T 

^prop 

array 

275 

99.3 

278.2 

76.4 

4080.8 

99.3 

157.5 

99.3 

187.5 

99.3 

132.6 

99.3 

76.4 


init 

919 

94.7 

4239.4 

73.0 

17472.2 

92.8 

5469.7 

84.9 

10598.0 

83.2 

10546.8 

94.7 

73.0 


in-use 

177 

86.4 

1854.0 

77.4 

2768.2 

94.9 

1008.3 

70.1 

3806.2 

65.0 

3960.6 

94.9 

65.0 


symm 

56 

66.1 

1476.2 

51.8 

1944.4 

48.2 

1911.3 

58.9 

1596.7 

58.9 

1424.8 

66.1 

48.2 


norm 

41 

46.3 

1361.2 

0.0 

2483.3 

41.5 

1478.2 

53.7 

1286.7 

51.2 

1275.3 

53.7 

0.0 

-^eval 

array 

28 

100.0 

16.2 

100.0 

19.7 

100.0 

10.4 

100.0 

12.7 

100.0 

1.7 

100.0 

100.0 


init 

790 

94.6 

3944.2 

94.1 

8288.0 

93.3 

4380.1 

82.5 

10239.0 

82.0 

9040.2 

94.6 

82.0 


in-use 

172 

86.0 

1852.2 

83.1 

2305.2 

94.8 

1023.1 

69.8 

3718.1 

67.4 

3561.1 

94.8 

67.4 


symm 

56 

66.1 

1451.1 

66.1 

1500.4 

51.8 

1716.0 

62.5 

1455.5 

58.9 

1389.8 

66.1 

51.8 


norm 

30 

53.3 

859.4 

13.3 

1575.8 

50.0 

940.5 

66.7 

736.7 

53.3 

858.0 

66.7 

13.3 

T 

x array 

array 

28 

100.0 

15.4 

100.0 

19.8 

100.0 

10.4 

100.0 

12.7 

100.0 

1.7 

100.0 

100.0 


init 

582 

100.0 

527.6 

100.0 

823.9 

99.7 

875.8 

100.0 

1401.3 

99.0 

785.1 

100.0 

99.0 


in-use 

47 

100.0 

323.9 

100.0 

343.2 

100.0 

171.3 

100.0 

262.6 

87.2 

525.2 

100.0 

87.2 


symm 

1337 

100.0 

1104.3 

99.9 

1629.3 

99.4 

746.4 

99.1 

963.9 

99.0 

922.7 

100.0 

99.0 


norm 

32 

59.4 

678.4 

18.8 

1583.1 

59.4 

709.7 

62.5 

791.7 

50.0 

858.6 

62.5 

18.8 

T * 

■* array 

array 

1457 

99.9 

916.4 

94.2 

5918.0 

99.9 

210.8 

99.9 

240.6 

99.9 

153.1 

99.9 

94.2 


init 

3825 

99.7 

3412.3 

96.3 

13536.1 

99.5 

4574.9 

99.8 

4952.1 

98.4 

6000.1 

99.8 

96.3 


in-use 

3089 

99.8 

2438.4 

99.4 

5139.0 

99.8 

889.2 

99.8 

793.5 

99.6 

925.9 

99.8 

99.4 


symm 

6403 

99.9 

5317.4 

99.7 

11787.7 

99.7 

3385.1 

99.6 

3277.3 

99.6 

1807.0 

99.9 

99.6 


norm 

157 

86.0 

1306.8 

72.6 

2670.8 

86.0 

1351.3 

86.6 

1449.9 

86.0 

1276.2 

86.6 

72.6 

Tpolicy 

array 

26 

100.0 

15.0 

100.0 

17.7 

100.0 

9.9 

100.0 

12.0 

100.0 

1.6 

100.0 

100.0 


init 

582 

100.0 

529.2 

100.0 

827.9 

99.5 

875.2 

100.0 

1418.9 

99.0 

782.5 

100.0 

99.0 


in-use 

20 

100.0 

281.7 

100.0 

329.7 

100.0 

170.7 

100.0 

262.6 

70.0 

524.8 

100.0 

70.0 


symm 

1345 

100.0 

1 104.6 

99.9 

1640.5 

99.4 

760.0 

99.1 

1048.8 

99.0 

926.9 

100.0 

99.0 


norm 

14 

100.0 

9.0 

57.1 

375.8 

100.0 

26.2 

100.0 

108.0 

71.4: 

241.8 

100.0 

57.1 


Table 2. Certification results and times 


avoids exponential increase in the number of generated clauses. This effect is most 
visible on the unsimplified obligations (e.g., 7® under init ), where Spass and e-setheo- 
csp03F — which both use the Flotter clausifier — perform substantially better than the 
other provers. 

Since our proof tasks are generated directly by a real application and are not “hand- 
picked” for certain properties, a large number of them is (almost) trivial — even in the 
unsimplified case. Figure 4 shows the resources required for the proof tasks as a series of 




pie chans. For each simplification stage, we show the percentage of proof tasks which 
are “very simple"’ (i.e., T proo j < Is, white), those which require between 1 and 10 
seconds of run time (light gray), and the difficult ones (dark gray). The percentage of 
tasks which fail within the 60s time limit is displayed in black. All numbers are obtained 
with e-setheo-csp03F; the figures for the other provers look similar. With additional 
preprocessing and simplification of the proof obligations, the number of failing proof 
tasks decreases sharply from approximately 16% to zero and the number of easy tasks 
increases substantially. This demonstrates the advantages of aggressive simplification 
of the proof tasks. 



Fig. 4. Distribution of short (T pwof < Is, white), medium (T pmof < 10s, light grey), long ( T proof < 
60s, dark grey) proofs, and failing tasks (black) for the various simplification staaes (e-setheo- 
csp03F). 


4.3 Difficult Proof Tasks 

Since all proof tasks are generated in a uniform manner through the application of a 
safety policy by the VCG, it is obvious that many of the difficult proof tasks share 
some similarities in structure. We have identified three classes of hard examples; these 
classes are directly addressed by the rewrite rules of the policy-specific simplifications 
(see Section 3.2). 

Most safety obligations generated by the VCG are of the form A ^ B\ A . . . A B n 
where the B t are variable disjoint. These obligations can be split up into n smaller 
proof obligations of the form A Bi and most theorem provers can then handle these 
smaller independent obligations much more easily than the large original. 

The second class contains formulas of the form syrrun{r) => symm(diag-updates(r )). 
Here, r is a matrix variable which is updated along its diagonal, and we need to show 
that r remains symmetric (as defined in Section 2.1) after the updates. For a 2x2 matrix 
and two updates (i.e., r 0 o = x and r n = y ), we obtain the following simplified version 
of an actual proof task: 

\/iJ • (0 < Lj < 1 =» sel(r,ij) = sel(rj,i)) => 

Vt, j * (0 < i,j < 1 => 

sel(upd(upd(r , 1, 1, y), 0, 0,x), ij) = sel(upd{upd{r , 1, 1, y), 0, 0, x)J y i)). 

This pushes the provers to their limits — e-setheo cannot prove this while Spass succeeds 
here but fails if the dimensions are increased to 3x3, or if three updates are made. In 


our examples, matrix dimensions up to 6x6 with 36 updates occur, yielding large proof 
obligations of this specific form which are not provable by current ATPs without further 
preprocessing. 

Another class of trivial but hard examples, which frequently shows up in the im- 
policy, also results from the expansion of deeply nested seljupd- terms. These problems 
have the form 

Vi, j ■ 0 < i < n A 0 < j < n => (i ^ 0 A j 0 A . . . i ^ n A j n => false) 
and soon become intractable for the clausifier, even for small n (n = 2 or n = 3). 


5 Conclusions 

We have described a system for the automated certification of safety properties of 
NASA state estimation and data analysis software. The system uses a generic VCG to- 
gether with explicit safety policies to generate policy-specific safety obligations which 
are then automatically processed by a first-order ATP. We have evaluated several state- 
of-the-art ATPs on more than 25,000 obligations generated by our system. With “out- 
of-the-box” provers, only about two-thirds of the obligations could be proven. However, 
after aggressive simplification, most of the provers could solve all emerging obligations. 
In order to see the effects of simplification more clearly, we experimented with specific 
preprocessing stages. 

It is well-known that, in contrast to traditional mathematics, software verification 
hinges on large numbers of mathematically shallow but structurally complex proof 
tasks, yet current provers are not well suited to this. Since the propositional structure 
of a formula is of great importance, we believe that clausification algorithms should 
integrate more simplification and split goal tasks into independent subtasks. 

Certain application-specific constructs (e.g., seJ/upd) can easily lead to proof tasks 
which cannot be handled by current ATPs. The reason is that simple manipulations 
on deep terms, when combined with equational reasoning, can result in a huge search 
space. Although specific parameter settings in a prover might overcome this problem, 
this would require a deep knowledge of the individual theorem provers, and in our 
experiments, we did not use any specific features or parameter settings for the individual 
theorem provers. 

With our approach to certification of auto-generated code, we are able to automati- 
cally produce safety certificates for code of considerable length and structural complex- 
ity. By combining rewriting with state-of-the-art automated theorem proving, we obtain 
a safety certification tool which compares favorably with tools based on static analysis 
(see [3] for a comparison). 

Our current efforts focus on integrating additional safety properties and extending 
the approach to synthesized code that has been modified manually. 
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